Goto

Collaborating Authors

 descriptive statistic


Semantic Trading: Agentic AI for Clustering and Relationship Discovery in Prediction Markets

Capponi, Agostino, Gliozzo, Alfio, Zhu, Brian

arXiv.org Artificial Intelligence

Prediction markets allow users to trade on outcomes of real-world events, but are prone to fragmentation with overlapping questions, implicit equivalences, and hidden contradictions across markets. We present an agentic AI pipeline that autonomously (i) clusters markets into coherent topical groups using natural-language understanding over contract text and metadata, and (ii) identifies within-cluster market pairs whose resolved outcomes exhibit strong dependence, including "same-outcome" (correlated) and "different-outcome" (anti-correlated) relationships. Using a historical dataset of resolved markets on Poly-market, we evaluate the accuracy of the agent's relational predictions. We then synthesize discovered relationships into a simple trading strategy to quantify how discovered relationships translate into actionable strategies. Results show that agent-identified relationships have around 60-70% accuracy, and their induced trading strategies have an average return of 20% over week-long horizons, highlighting the ability of agen-tic AI and large language models to uncover latent semantic structure within prediction markets.


Shared Parameter Subspaces and Cross-Task Linearity in Emergently Misaligned Behavior

Arturi, Daniel Aarao Reis, Zhang, Eric, Ansah, Andrew, Zhu, Kevin, Panda, Ashwinee, Balwani, Aishwarya

arXiv.org Artificial Intelligence

Recent work has discovered that large language models can develop broadly misaligned behaviors after being fine-tuned on narrowly harmful datasets, a phenomenon known as emergent misalignment (EM). However, the fundamental mechanisms enabling such harmful generalization across disparate domains remain poorly understood. In this work, we adopt a geometric perspective to study EM and demonstrate that it exhibits a fundamental cross-task linear structure in how harmful behavior is encoded across different datasets. Specifically, we find a strong convergence in EM parameters across tasks, with the fine-tuned weight updates showing relatively high cosine similarities, as well as shared lower-dimensional subspaces as measured by their principal angles and projection overlaps. Furthermore, we also show functional equivalence via linear mode connectivity, wherein interpolated models across narrow misalignment tasks maintain coherent, broadly misaligned behavior. Our results indicate that EM arises from different narrow tasks discovering the same set of shared parameter directions, suggesting that harmful behaviors may be organized into specific, predictable regions of the weight landscape. By revealing this fundamental connection between parametric geometry and behavioral outcomes, we hope our work catalyzes further research on parameter space interpretability and weight-based interventions.


Heartificial Intelligence: Exploring Empathy in Language Models

Williams, Victoria, Rosman, Benjamin

arXiv.org Artificial Intelligence

Large language models have become increasingly common, used by millions of people worldwide in both professional and personal contexts. As these models continue to advance, they are frequently serving as virtual assistants and companions. In human interactions, effective communication typically involves two types of empathy: cognitive empathy (understanding others' thoughts and emotions) and affective empathy (emotionally sharing others' feelings). In this study, we investigated both cognitive and affective empathy across several small (SLMs) and large (LLMs) language models using standardized psychological tests. Our results revealed that LLMs consistently outperformed humans - including psychology students - on cognitive empathy tasks. However, despite their cognitive strengths, both small and large language models showed significantly lower affective empathy compared to human participants. These findings highlight rapid advancements in language models' ability to simulate cognitive empathy, suggesting strong potential for providing effective virtual companionship and personalized emotional support. Additionally, their high cognitive yet lower affective empathy allows objective and consistent emotional support without running the risk of emotional fatigue or bias.


Artificial Finance: How AI Thinks About Money

Erdem, Orhan, Ashok, Ragavi Pobbathi

arXiv.org Artificial Intelligence

In this paper, we explore how large language models (LLMs) approach financial decision - making by systematically comparing their responses to those of human participants across the globe. We posed a set of commonly used financial decision - making questions t o seven leading LLMs, including five models from the GPT series (GPT - 4o, GPT - 4.5, o1, o3 - mini), Gemini 2.0 Flash, and DeepSeek R1 . We then compared their outputs to human responses drawn from a dataset covering 53 nations. Our analysis reveals three main r esults. First, LLMs generally exhibit a risk - neutral decision - making pattern, favoring choices aligned with expected value calculations when faced with lottery - type questions . Second, when evaluating trade - offs between present and future, LLMs occasionally produce responses that appear inconsistent with normative reasoning . Third, when we examine cross - national similarities, we f ind that the LLMs' aggregate responses most closely resemble those of participants from Tanzania. These findings contribute to the understanding of how LLMs emulate human - like decision behaviors and highlight potential cultural and training influences embedded within their outputs.


OSU-Wing PIC Phase I Evaluation: Baseline Workload and Situation Awareness Results

Adams, Julie A., Sanchez, Christopher A., Mallampati, Vivek, Smith, Joshua Bhagat, Burgess, Emily, Dassonville, Andrew

arXiv.org Artificial Intelligence

The common theory is that human pilot's performance degrades when responsible for an increased number of uncrewed aircraft systems (UAS). This theory was developed in the early 2010's for ground robots and not highly autonomous UAS. It has been shown that increasing autonomy can mitigate some performance impacts associated with increasing the number of UAS. Overall, the Oregon State University-Wing collaboration seeks to understand what factors negatively impact a pilot's ability to maintain responsibility and control over an assigned set of active UAS. The Phase I evaluation establishes baseline data focused on the number of UAS and the number of nests increase. This evaluation focuses on nominal operations as well as crewed aircraft encounters and adverse weather changes. The results demonstrate that the pilots were actively engaged and had very good situation awareness. Manipulation of the conditions did not result in any significant differences in overall workload. The overall results debunk the theory that increasing the number of UAS is detrimental to pilot's performance.


Fast Online Changepoint Detection

Ghezzi, Fabrizio, Rossi, Eduardo, Trapani, Lorenzo

arXiv.org Machine Learning

We study online changepoint detection in the context of a linear regression model. We propose a class of heavily weighted statistics based on the CUSUM process of the regression residuals, which are specifically designed to ensure timely detection of breaks occurring early on during the monitoring horizon. We subsequently propose a class of composite statistics, constructed using different weighing schemes; the decision rule to mark a changepoint is based on the largest statistic across the various weights, thus effectively working like a veto-based voting mechanism, which ensures fast detection irrespective of the location of the changepoint. Our theory is derived under a very general form of weak dependence, thus being able to apply our tests to virtually all time series encountered in economics, medicine, and other applied sciences. Monte Carlo simulations show that our methodologies are able to control the procedure-wise Type I Error, and have short detection delays in the presence of breaks.


"Do it my way!": Impact of Customizations on Trust perceptions in Human-Robot Collaboration

Kapoor, Parv, Chu, Simon, Chen, Angela

arXiv.org Artificial Intelligence

Trust has been shown to be a key factor in effective human-robot collaboration. In the context of assistive robotics, the effect of trust factors on human experience is further pronounced. Personalization of assistive robots is an orthogonal factor positively correlated with robot adoption and user perceptions. In this work, we investigate the relationship between these factors through a within-subjects study (N=17). We provide different levels of customization possibilities over baseline autonomous robot behavior and investigate its impact on trust. Our findings indicate that increased levels of customization was associated with higher trust and comfort perceptions. The assistive robot design process can benefit significantly from our insights for designing trustworthy and customized robots.


Robot Navigation in Risky, Crowded Environments: Understanding Human Preferences

Suresh, Aamodh, Taylor, Angelique, Riek, Laurel D., Martinez, Sonia

arXiv.org Artificial Intelligence

Risky and crowded environments (RCE) contain abstract sources of risk and uncertainty, which are perceived differently by humans, leading to a variety of behaviors. Thus, robots deployed in RCEs, need to exhibit diverse perception and planning capabilities in order to interpret other human agents' behavior and act accordingly in such environments. To understand this problem domain, we conducted a study to explore human path choices in RCEs, enabling better robotic navigational explainable AI (XAI) designs. We created a novel COVID-19 pandemic grocery shopping scenario which had time-risk tradeoffs, and acquired users' path preferences. We found that participants showcase a variety of path preferences: from risky and urgent to safe and relaxed. To model users' decision making, we evaluated three popular risk models (Cumulative Prospect Theory (CPT), Conditional Value at Risk (CVAR), and Expected Risk (ER). We found that CPT captured people's decision making more accurately than CVaR and ER, corroborating theoretical results that CPT is more expressive and inclusive than CVaR and ER. We also found that people's self assessments of risk and time-urgency do not correlate with their path preferences in RCEs. Finally, we conducted thematic analysis of open-ended questions, providing crucial design insights for robots is RCE. Thus, through this study, we provide novel and critical insights about human behavior and perception to help design better navigational explainable AI (XAI) in RCEs.


Solving Spotify Multiclass Genre Classification Problem

#artificialintelligence

The music industry has become more popular, and how people listen to music is changing like wildfire. The development of music streaming services has increased the demand for automatic music categorization and recommendation systems. Spotify, one of the world's leading music streaming sites, has millions of subscribers and a massive song catalog. Yet, for customers to have a personalized music experience, Spotify must recommend tracks that fit their preferences. Spotify uses machine learning algorithms to guide and categorizes music based on the Genre.


Providing Insights for Open-Response Surveys via End-to-End Context-Aware Clustering

Esmaeilzadeh, Soheil, Williams, Brian, Shamsi, Davood, Vikingstad, Onar

arXiv.org Artificial Intelligence

Teachers often conduct surveys in order to collect data from a predefined group of students to gain insights into topics of interest. When analyzing surveys with open-ended textual responses, it is extremely time-consuming, labor-intensive, and difficult to manually process all the responses into an insightful and comprehensive report. In the analysis step, traditionally, the teacher has to read each of the responses and decide on how to group them in order to extract insightful information. Even though it is possible to group the responses only using certain keywords, such an approach would be limited since it not only fails to account for embedded contexts but also cannot detect polysemous words or phrases and semantics that are not expressible in single words. In this work, we present a novel end-to-end context-aware framework that extracts, aggregates, and abbreviates embedded semantic patterns in open-response survey data. Our framework relies on a pre-trained natural language model in order to encode the textual data into semantic vectors. The encoded vectors then get clustered either into an optimally tuned number of groups or into a set of groups with pre-specified titles. In the former case, the clusters are then further analyzed to extract a representative set of keywords or summary sentences that serve as the labels of the clusters. In our framework, for the designated clusters, we finally provide context-aware wordclouds that demonstrate the semantically prominent keywords within each group. Honoring user privacy, we have successfully built the on-device implementation of our framework suitable for real-time analysis on mobile devices and have tested it on a synthetic dataset. Our framework reduces the costs at-scale by automating the process of extracting the most insightful information pieces from survey data.